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Abstract 

Non linear mixed effect models are classical tools to analyze non linear longitudinal data in 
many fields such as population Pharmacokinetic. Groups of observations are usually compared 
by introducing the group affiliations as binary covariates with a reference group that is stated 
among the groups. This approach is relatively limited as it allows only the comparison of the 
reference group to the others. In this work, we propose to compare the groups using a penalized 
likelihood approach. Groups are described by the same structural model but with parameters that 
are group specific. The likelihood is penalized with a fused lasso penalty that induces sparsity 
on the differences between groups for both fixed effects and variances of random effects. A 
penalized Stochastic Approximation EM algorithm is proposed that is coupled to Alternating 
Direction Method Multipliers to solve the maximization step. An extensive simulation study 
illustrates the performance of this algorithm when comparing more than two groups. Then the 
approach is applied to real data from two pharmacokinetic drug-drug interaction trials. 

Keywords: Nonlinear mixed effect model, SAEM algorithm, fused lasso, group comparison, 
pharmacokinetic 


1. Introduction 

Non Linear Mixed Effects Models (NLMEMs) are used to model and analyze longitudinal data 
in several fields, especially in clinical trials and population Pharmacokinetic (PK). In clinical 
research, observations may present a group structure corresponding to the different treatment 
modalities. For example, a drug-drug interaction clinical trial between two compounds includes 
two groups of observations, patients treated with the molecule of interest and patients treated 
with the two compounds. When population PK data have been collected during the trial, PK 
parameters are estimated through an NLMEM, and the interaction (existence and mechanism) is 
assessed through the variation of the PK parameters across groups. 

Statistical tests are classically used to identify significant influence of the group structure on 
a (PK) parameter. The group affiliation is included as a categorical covariate and its influence 
is studied with maximum likelihood tests ( jSamson et al.| |2007[ ). Note that the likelihood of 
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NLMEM being intractable, stochastic versions of the EM algorithm, among other methods, can 
be used to estimate the parameters, especially the SAEM algorithm ([Delyon et al. 1999 Kuhn 


and Lavielle 20051. A stepwise procedure combined to a BIC criterion is then used to select 


the best model among the collection of models with the group affiliation covariate on each pa¬ 
rameter. A drawback of this approach is that a reference group has first to be stated, and then 
only differences with this reference group are considered. When there are more than two groups, 
this does not allow to select a model with no difference between two non reference groups. In 
order to study the differences between non reference groups, combination of the group covariates 
could be used, but their number increases rapidly with the number of groups. Indeed, the num¬ 
ber of between group differences models is equal to (Bc)^ where Be is the Bell’s number (|Bell 


|1934[ ) for G groups and p the number of studied parameters. Considering 5 groups and studying 
between group differences on 3 parameters leads to 52^ possible models. 

Nevertheless, the relevance of group differences between all the groups can be directly studied 
using a penalized joint modeling approach ([Viallon et al. [2014 Gertheiss and Tutz 2012[ Ollier 


and Viallon} 2014|l. The same structural model is applied to each group with a structural sparsity- 


inducing penalty ( |Bach et ak] |201 1| | that encourages parameters to be similar in each group. 
In this work, group parameters are estimated by maximizing the penalized likelihood with a 
fused lasso penalty. This penalty was originally designed to penalize differences of coefficients 
corresponding to successive features ( [Tibshirani et al. 2005 [ ) and has been generalized to account 
for features with a network structure (Hofling et al. 2010| l. 

Sparsity inducing penalties in linear mixed effects models (LMEMs) have been proposed for 
selecting fixed effects only (Schelldorfer et al. 2011 Rohart et al.[ 2014[ ) and both fixed effects 
and random effects variances (Bondell et al. 2010| l. Note that the joint selection of fixed effects 
and random effects variances is complex because the likelihood is not convex with respect to 
the variances. The difficulty increases with NLMEM as the likelihood is intractable (contrary 
to LMEMs), and only a few papers deal with penalties in NLMEM. Arribas-Gil et al. ( |2014| l 
study selection of semi parametric NLMEM using a lasso penalty, the lasso selection step and 
the parameter estimation being realized separately. Bertrand and Balding p013| l consider /j 
penalized NLMEM for genetic variant selection. They propose a penalized version of the SAEM 
algorithm, in which the maximization step corresponds to an /i penalized weighted least square 
problem. The optimal sparsity parameter is set using an asymptotic estimation. The recent 
stochastic proximal gradient algorithm ( |Atchade et al.[ |2014) l could offer an interesting solution 
to optimize penalized intractable likelihood but it has not been applied to NLMEMs. Up to our 
knowledge, no work investigates the use of a structured penalty in the context of NLMEM. 

The objective of this paper is to incorporate the fused lasso penalty in the SAEM algorithm, in 
order to jointly estimate NLMEMs on several groups, and detect relevant differences among both 
fixed effects and variances of random effects. Penalties are introduced in the maximization step 
of the SAEM algorithm. Eixed effects and variances of random effects are penalized through a 
sum of absolute differences. The penalized differences correspond to edges of a graph in which 
the vertices correspond to the groups. Solving this penalized optimization problem is not trivial 


and we suggest to use an Alternating Direction Method of Multipliers (ADMM) (Boyd et al. 


20111. The direct penalization of the variances yielding to a non convex optimization problem. 


we propose to penalize the inverse variance-covariance matrix, assuming the matrix is diagonal. 
An ADMM algorithm can be used to solve the penalized optimization problem, its proximal step 
being explicit or not, depending on the number of groups. We also consider weighted penalties, 
following the ideas of the adaptive Lasso (Zou 2006) 1. The selection of the two tuning parameters 
introduced in the two penalties is realized using the BIC criterion. 
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The paper is organized as follows. Section [^introduces NLMEM and the SAEM algorithm. 
In Sectionj^we introduce the fused lasso penalty, the penalized SAEM algorithm and the tuning 
parameter selection. In Sectionj^ the penalized-SAEM algorithm is evaluated on simulated data 
with 2 groups or more. Einally, it is applied to real data from a cross over clinical trials studying 
drug-drug interaction between dabigatran etexilate and 3 other drugs in Section]^ 


2. Joint estimation of multiple non linear mixed effects models 


2.1. Group structured NLMEMs 

Let j be the observation at time t^j {j e {1,... ,n^)) for the /-th patient (i e {!,... ,Ng}) in 
the g-th group (g e G)). We consider models of the form: 

ef. ~ 7V(0,1) (iid) 

where / and d are two given non linear functions. The function d corresponds to the error model 
and is assumed to be c/ = af + b with a and b two real constants. Measurement errors are 
further assumed to be independent and identically distributed. Individual parameters (p^. for the 
/-th subject in group g is a p-dimensional random vector, independent of and assumed to be 
decomposable (up to a transformation h) as: 


/2(0f) = 


yU* + b‘ 

N{0,D.^) (iid) 


where 6 is the mean parameter vector for the group g and b^. 6 R'’ the random effects 
of the /-th patient. Different transformations h can be used. Here we use the common one 
h(x) - log(.r), that yields log-normally distributed In this work, is supposed diagonal as 


explained in section 3.1.2 


The log-likelihood then takes the form: 


LL(0) = logp(y;0) = log 



( 1 ) 


where p(y^, 0*; 0®) is the likelihood of the complete data in group g: 


logp(/, 0®; 0®) = - ^ log(d(tlj, ^f)) 




'(<^f -jU®) - log(2;r) 


- -j- log(|Q®|) 


with 9 - (0\ ..., (P) and 0® = (yu®, Q®, a, b) the parameters to be estimated. Note that the log- 
likelihood LLiO) as defined in Equation Q has generally no closed form expression because of 
the non linearity of / with respect to (p. 
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2.2. SAEM algorithm for the joint estimation problem 


In this section, we present a standard version of the SAEM algorithm in the context of joint 
estimation that will be the base of the penalized version that we introduce later. Here, we do not 
account for potential similarities of the parameters across groups. 


The SAEM algorithm is a classical tool for parameter estimation of NLMEMs (Delyon et al. 


19991. It iteratively maximizes the conditional expectation of the complete data log-likelihood. 


At iteration k given the current estimates 9k-\, the problem reduces to the optimization of the 
following criterion: 


G G 

Qk(0) ^J]Qk(e^) - 

«=1 S=1 

As this conditional expectation has no closed form for NLMEMs, it is approximated using a 
stochastic approximation scheme. The E-step of the classical EM algorithm is then divided in 
two parts, a simulation step where individual parameters are simulated using a Markov Chain 
Monte Carlo method (MCMC) and a stochastic approximation step (Kuhn and Lavielle 2005) 1. 
At iteration k of the SAEM algorithm we have: 

1. Estimation step (E-step): 

(a) Simulation step: draw 0* using an MCMC procedure targeting p(.|y®, ^_j)- 

(b) Stochastic approximation step of Qk(d)' update QkiO) using the following scheme 

Qim = QUm + r-tdog p(y^<Pt;0^) - QUm) 

where jk is a decreasing sequence of positive numbers. When the complete data like¬ 
lihood belongs to the exponential family, this step simply reduces to the stochastic 
approximation of its sufficient statistics s^^. and j,: 


/ N, 


IJ.k 


^2,k 


•"3,k 


^i,kk-i + n 


lj,k-l 


= ^2,k-i+ri< 


Vi'=l 

iKk4k-4,k-i 

i=\ 

,k-i + 7k (Sy [y-j - /(fy, 4>lk)f - if ^ = 0 
+ yk{y ■ ■ 

.k-3 ^ kk I ) ■*3„ 


if a = 0 


2. Maximisation step (M-step): update of population parameters: 

Ok = ArgMax Qk{0). 
e 

Within the exponential family, the solution is explicit for and Q*: 




1 
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For parameters a and b, they are updated using the whole data set because they are common 
to all the groups. An explicit solution exists when a = 0 or b - 0: 




When a 0 and b ^ 0, the maximization problem has to be solved numerically. 

Thus, except for a and b, SAEM algorithm for the joint estimation problem is implemented as if 
the G groups were analyzed separately. 

3. Penalized joint estimation of group structured NLMEM 

The previous SAEM algorithm corresponds to parameters estimated within each group. But 
groups can be expected to share common characteristics, so that theoretical parameters are ex¬ 
pected to exhibit similarities. Therefore, we introduce a penalty within the SAEM algorithm that 
encourages parameters to be equal. We first detail the fused penalties, then the penalized SAEM 
algorithm and finally the selection of the two tuning parameters introduced in the penalties. 

3.1. Fused lasso penalties for group comparison 

The fused lasso penalty encourages parameters to have the same value between two groups. 
This is particularly useful when theoretical parameters of (at least some of) the groups are ex¬ 
pected to be similar and/or when the objective of the study is to assess potential differences 
between groups. Depending on the context, differences between all the groups or only some 
specific differences might be of interest. Likewise, similarity of some parameters does not nec¬ 
essarily hold for all the groups. These differences and similarities of interest can be described 
with a graph that links groups together. Two groups are related in the graph if the comparison of 
these two groups is of interest, or if parameters are assumed to be similar in these two groups. 
Of course, any graph structure can be put forward, but some of them are naturally appealing in 
various contexts (see figure[T]with G - 4); 

• Clique Graph: no assumptions on the hierarchical structure of the groups are made. All the 
possible differences between group parameters are penalized. 

• Star Graph: a reference group is stated and only the differences between the reference group 
and the others are penalized. This is equivalent to the classical approach based on group 
affiliation covariate. 

• Chain Graph: when groups can be naturally ordered. 

Given a specihc graph described by its edge set £, we introduce the penalties for the fixed and 
the variance parameters. 
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Figure 1: Examples of graphs for G = 4 groups 


3.J.J. Fixed parameters 

For fixed parameters (ju',... the fused lasso penalty corresponds to: 

(gi,g2)e£, 

where ||x||i = 2; \^i\ the Zi-norm. The fused lasso penalty encourages the fixed parameters to 
be the same between two groups connected in the graph. 


3.1.2. Variance parameters 

Concerning random effect variances, components of the variance covariance matrix should be 
penalized. However, the resulting optimization problem is not convex. Some algorithms have 
been proposed to solve simple /] penalty problems ( Bien and Tibshirmil 201 1[ Wang 2013| l 
but they are computationally demanding and their extension to the fused penalty context is not 
straightforward. However under the assumptions that is diagonal, a simple alternative consists 
in penalizing the inverse variance covariance matrix. Then the penalized optimization problem 
becomes convex and can be solved efficiently. This corresponds to the following penalty: 




...Q^ 


')= 2 11 “"' 
(gi,«2)e£ 


II, = 


Z Z 

'=1 («i,K)e£ 




Penalizing differences between is not equivalent to penalizing differences between as 
|Q^.‘ - I Some issues could occur when considering parameters with very 

different levels of variability: the penalty rapidly discards differences for parameters with low 
variance. This issue is mitigated when working with log-normally distributed individual param¬ 


eters, and adaptive weights can further help to prevent such a behavior (see section 4.2 1 . 


3.1.3. Matricial formulation and adaptive weights 

Weights (7r,v) can be introduced in order to take into account some prior information: 

(gi,g2)e6 /=1 

^ -Qf |. 

(gi,g2)£B i=l 
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These weights can be based on initial maximum likelihood estimates within each group 
following the idea of the adaptive fused lasso (Viallon et al. 2014 1 : = |/i^' - and 

|““ for some a > 0 (typically a - 1). These weighted penalties are particularly 
helpful to compute unpenalized re-estimation of the selected model (section [33] l. 

Finally, the weighted penalties with weights n and v can be written in a matricial form: 


Pi/(Q'^',...Q^^‘) = ||v.Fdiag(Q-')||i 

where the matrix P € {-1,0,1)'^'^'^^ encodes the penalized values of p = . ,f/’y and 

diag(Q“*) = (diag(Q' diag(Q'^ ))' and • stands for the Hadamard product. 

3.2. Penalized version ofSAEM algorithm 

The penalized SAEM algorithm consists in iteratively maximizing the penalized stochastic 
approximation of the conditional expectation QkiS). 

Qk(e) - AfPf(m\. TvPv(Q'^‘,...Q^^') 

where Af and Ay are two tuning parameters that control the penalty strength and that have to be 
calibrated. When Af — Ay = 0, the estimates correspond to the classical maximum likelihood 
estimates. For large enough values, the vector of penalized differences is set to zero {Pfi = 0 
and/or f’diag(Q ') = 0). The penalized SAEM is the standard SAEM except for the M-step: 
a fused lasso penalized regression problem is solved for both fixed effects and random effects 
variances updates, with fixed tuning parameters Af and Ay. At iteration k, it corresponds to (Box 
1 ): 


Box 1: Maximization step of the penalized SAEM algorithm 

1. Fixed effects update: 


' G 

= ArgMax ^ Vi) - AfPf(p\ ■ ■ ■ 


2. Random effects variances update: 

G 


..., 0.9) - ArgMax 


3. Error model parameters update: usual update. 


Y^Qk(iil0^ak-x,bk-i) - AyPy{Q}",...,Qp") 
Vg=i 


We now turn to the description of the two update steps for the fixed effects and the random 
effects variances. 
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3.2.1. Fixed effects update 

For fixed effects update, the conditional expectation of the complete likelihood reduces to the 
following weighted least square function: 

G _ j 

Qkijj) ,^l^^,ak-\,bk-\) = C - - 

>;=i 


G , 

ZZ 

j;=i 


-F 




,k 




where C is a constant not depending on p^. The matricial form of the problem to be solved is: 


{pl,...,pf') = ArgMax2*(/z)-Tf||P/r||i. 


( 2 ) 


This optimization problem corresponds to an extension of the generalized fused lasso of Hoflmg 


et al. ( 2010| l where the least-squares is replaced by weighted least-squares. It can be solved with 
the Alternating Direction Method of Multipliers (ADMM) (Boyd et al. [2011) 1, that breaks the 
convex optimization problem into small pieces. More precisely, problem Q can be rewritten as 
an equality constraints optimization problem, where p is split in two parts p and z: 


/l = ArgMin - Qic(p) + Af\\z\\i. 

fi,Z 

s.t Pp - z - 0 

The ADMM algorithm solves Q by iteratively solving smaller (and easier) problems for each 
primal (p, z) and dual (u) variables separately using the augmented Lagrangian formulation: 

ArgMin ArgMax - Qk(p) + df ||z||i -H {u,Pp - z) + ^\\Pp -z + u\\\.. 

Here p is the augmented lagrangian parameter (generally set to 1) and || ■ II 2 the l 2 -norm. This 
corresponds to applying the steps presented in Box 2 at each iteration q until convergence. When 
adaptive weights are included in the penalty, the same algorithm can be used except that the 
tuning parameter Ap is replaced by the vector Ap • n. 


3.2.2. Variance Covariance matrix of random effects update 

The conditional expectation of the complete likelihood for group g is: 

Vi) = C-log|Q«| -Trace 

where C is a constant not depending on Q* and 2^ corresponds to the solution of the non penal¬ 
ized problem: 




1=1 





Then the problem to be solved is: 

' G 

(^1,... = ArgMax - ^ (log |Q^| 




\ 8=1 


-H Trace - AyPyi^^^', ■ ■ ■ 


(3) 
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Box 2: ADMM algorithm for fixed effects update 

1. Initialization; yuo = jJ-k-x, Zo - 0, uq - 0 

2. For ^ = 0,1,2,... until convergence: 

(a) yU update: 

= ArgMin^-(2i(yU) + ^WP^l-Zq + Ug\\l^ = (A +pP'P)^'(r +pP‘{Zq - Uq)) 

with r = diagczfjy n]:; s \.„..., 

and A = diag(AiQ^ ‘j, .. .,Nc^'^_[) 

(b) z update: 

z,+i = ArgMin^^||Fp^+i + Ug - z\\l + df||z||ij = SiriPpq+i + Ug) 

with the soft thresholding operator .S^(x) = sgn{x){\x\ - /l)+ 

(c) dual update: 

— Uq + Ppq+X Zq-^X 


Danaher et al. (2013|l consider a similar optimization problem (for joint graphical models) and 


propose an ADMM algorithm to solve it. We apply the same methodology here; problem Q has 
the following scaled augmented Lagrangian: 


ArgMin ArgMax 


(log m + Trace [Q.«^'tl]) + AvPv{Z\. 

+ -z^ + u% 


where (D.^ )g=x,...,GXZ^)g^x,....c are the primal variables, (U^)g=x,...,c the dual variables and p the 
augmented lagrangian parameter (generally set to 1). The ADMM algorithm consists in applying 
the steps presented in Box 3 at each iteration q until convergence. Step 1 has an explicit solution 
(Witten and Tibshirani 2009). Step 2 is the evaluation of the Py’s proximal operator. An explicit 
formula is available when G - 2 {Danaher et al. 2013[ ), but for G > 2 it has to be numerically 
approximated. This extends the computing time significantly. As for fixed effects, when adap¬ 
tive weights are included in the penalty, the same algorithm can be used except that the tuning 
parameter Ay is replaced by the vector Ay • v. 


3.3. Selection of the tuning parameters 

The described SAEM algorithm is applied with a fixed value of the tuning parameters A = 
(T/r, Ay). The value of these tuning parameters varying from zero to infinity, the SAEM algorithm 
selects a collection of models with a decreasing number of between group differences (from the 
full model to the model with no difference at all). The optimal A can be selected using the 
Bayesian Information Criterion (BIC): the optimal A is defined as returning the model with the 
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Box 3: ADMM algorithm for variances update 

1. Initialization; Qg = Z® = 0, I/g = 0 

2. For ^ = 0,1,2,... until convergence: 

(a) Q update; for g - 1,..., G 

Q*;‘i = ArgMin (log|Q*| + Trace(£fQ^^")) + - Z, + U,\\l 

n*-' ^ 


(b) Z update: 


G 




^ - Z^ + U% + AvPv(Z\. ..,Z^) 


\s=i 


(c) dual update; for g - 1,..., G 


u\, = Gf + - z\, 

q-\-\ •? q+\ q+\ 


minimal BIC. In practice, we run the algorithm on a user-defined grid (Ai,..., Am). Then the 
optimal value Abic is: 


Abic - ArgMin BIC(A) 
Ag{Ai,...,Am} 


where BIC(A) is the criterion of the model corresponding to the value A. For a NLMEM with 


random effects on all the parameters, the BIC criterion is defined as (Delattre et al. 20141: 


BIC = -2LL(e) + log(A) X df(0), 

where LL(0) is the log likelihood Q, df{6), the degree of freedom, is the number of distinct fixed 
effects and random effects variances in the selected model. For a given A, the penalized SAEM 
algorithm estimates a model (0 a) with a particular structure: some parameters have the same 
estimated value (their difference is set to 0). Eollowing the Lars-OLS-Hybrid algorithm ( |Efron| 
et al. 2004| l (that corresponds to a relaxed lasso (Meinshausen 2007| l with relaxing parameter 
set to 0), an unbiased estimation 6a of the parameters of this selected model is obtained by 
reestimating 6 with a constrained SAEM algorithm; 


6a - ArgMin - 2LL(6) 

e 

/’diagn ) = /-dia^^A ) 

where S (jc) is the support of vector x. The constraint on the support ensures that the solution 
of the constrained optimization problem has the same structure as the solution of the penalized 
estimate. This constrained optimization problem can be solved by the penalized SAEM algo¬ 
rithm with appropriate choices for the adaptive weights: non-null differences are attached to null 
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weights (and are therefore not penalized) and null differences are attached to weights that are 
high enough to force them to be null in the solution 0^. Finally, we take: 

B/C(A) = -2LL(0a) + log(fV) X dfifitO. 

4. Simulated data analysis 

Simulations are performed with the one compartment model with first order absorption: 

fit, K, Cl, V) = (4) 

Vka - Cl 

where ka ih~^). Cl {L.lT^) and V (L) correspond respectively to the absorption constant, the 
clearance and the volume of distribution parameters. The administrated dose (D) is set to 6 mg. 

First, the behavior of the penalized SAEM algorithm is illustrated with one dataset simulated 
with 3 groups of subjects. Especially, regularization paths depending on the values of the spar¬ 
sity parameter A is presented. Then the impact of high variances on the penalized estimation is 
studied on one dataset simulated with 3 groups of subjects, and the benefit of adaptive weights 
introduced in the penalty is shown. Next the influence of the penalty structure on selection per¬ 
formance is studied on 100 simulated data sets with 5 groups of subjects. Einally, joint fixed and 
variance parameters selection performance is compared between the penalized SAEM algorithm 
and the standard stepwise forward approach, on 50 simulated data sets with 2 groups of subjects. 

4.1. Behavior of the penalized SAEM algorithm 

One dataset of 3 groups with Ng = 100 subjects per group has been simulated using model 0 
and fixed effects parameters: 


Py-Py- 0.48 Py - 0.58 
Bci ~ Bci - Bci - 0.042 
pI^pI^I. 47 pl-2.m 

Random effects variances are: 

(Cj[f = (coif = (4)" = 0.1 
(colif = (coif = 0.1 (coif = 0.2 
(colf^O.l (coif ^0.3 = 0 - 2 - 

Individual parameters are log-normally distributed (h(<p) = log(0)). Error model parameters 
are set to a = 0 and b = 0.1. The penalised SAEM algorithm is implemented with 400 iterations. 
During the first 300 iterations, we use a constant step size equal to 1. Then, during the last 100 
iterations, the stochastic approximation scheme is implemented with a step size equal to 
at iteration iter. The evolution of each SAEM parameter estimate is plotted along iterations 
in Eigure|^for Af - 37 and Ay - 0.015 using a clique graph as penalty structure. In this 
example, the number of iterations has been chosen such that the convergence is clearly attained 
for all the model parameters. Eor these values of d/r and Ay, the model selected by the algorithm 
corresponds to the simulated one. Eigurej^ represents the regularization path of the estimates 
for both fixed effects and variances of random effects parameters. When increasing Af (or Ay) 

11 






0 100 200 300 400 

SAEM iteration 


0 100 200 300 400 


Figure 2: Simulated data, 3 groups: evolution of SAEM estimates with Af = 25 and Ay = 0.013. Red, blue and green 
curves correspond to estimates of group 1, 2 and 3, respectively. 



log A 


Figure 3: Simulated data, 3 groups: regularization paths of SAEM estimates for fixed effects and random effect variances. 
Red, blue and green curves correspond to estimates of group 1, 2 and 3, respectively. Solid black lines corresponds to 
the lambda values used in Figure[2(/l/r = 25, Ay = 0.013). 
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G=2 G=3 G=5 


Fixed 

23 s 

24 s 

99 s 

Fixed -H Variances 

32 s 

210 s 

411 s 


Table 1: Simulated data, 2, 3 or 5 groups: computational time of the penalized SAEM algorithm (400 iterations) with 
small tuning parameter values on a simulated data set of Ng = 100 subjects per group (G = 2, 3 or 5) with or without 
penalty on variance parameters. A clique graph is used for the penalty. 


values, differences between estimates get smaller and smaller until being null. The number of 
null differences increases with the value of A. 

As we have seen in the algorithm description, when G > 2 the proximal operation of the vari¬ 
ances penalty needs to be approximated numerically. The computational time is then increased 
when variances are penalized. Table [^presents the computational times for different numbers of 
groups when variances are penalized or not. These computational times vary in function of the 
values of Af and Ay', Table [T] corresponds to a worst-case scenario (small values of Af and Ay). 


4.2. Effect of variances rescaling with adaptive weights 


As discussed in Section 3.1.2 penalizing the inverse of covariance matrix is not equivalent to 
penalizing directly the variances. It could favor differences from parameters with a high variance 
and then select false models. This can be attenuated by rescaling the variances with adaptive 
weights. We propose the following weighting strategy: 




tei,«2)e£ i=l 


Ai 


Vi = 






where £2^ corresponds to the unpenalized estimation of Q^. To illustrate this, a data set of 3 
groups (100 subjects per group) is simulated using model Q with larger coy and smaller coka- 

= (ujlf = (4)2 = 0.3 

( 4)2 = ( 4 /- 0 . 1 , ( 4 ) 2 - 0.2 

(a>l f - 0.03, (coif = 0-075, (coif = 0.06. 

Figure 1^ presents the regularization path of estimates for (w® )^,(f)^,(ff using a clique 
structure for the penalty. Because the (coy)^ terms are all equal, we would hope estimates of 
these terms to be fused before the (co^ )2 and (cff terms. This is not the case without adaptive 
weights, and as a consequence, the optimal model is not spanned in the regularization path. 
Adaptive weights correct for this defect and the optimal model is spanned by the regularization 
path (blue shaded areas in Figure|^). 


4.3. Model Selection Performances 

4.3.1. Influence of penalty structure and adaptive weights 

We study the selection of fixed effects differences between groups on simulated data sets and 
the impact of the penalty structure on the proportion of correctly selected models. One hundred 
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Figure 4: Simulated data, 3 groups, large tjy, small Wh^'- regularisation paths of SAEM estimates for random effect 
variances with (ADAPTIVE) or without (CRUDE) adaptive weights. Red, blue and green curves correspond respectively 
to the estimates in group 1, 2 and 3. Blue shaded areas correspond to the values A that return the simulated differences 
model. 


datasets are simulated with 5 groups of subjects using model Q {Ng - 20 or Ng - 100). Fixed 
effects parameters are set to: 

fiy = 0.48 0.72 ^ = 0.96 

Fi:,-F^/ = 0.06 ;/3 ,=a,4, = 0.03 = 0.015 

= 1^1 = 1^1 = 1 - 47 . 

Random effects variances are set equal to 0.1 for all the parameters. Individual parameters are 
log-normally distributed. Error model parameters are set to a = 0 and h = 0.1. For each data set, 
a model is selected using the fused lasso approach on a grid of 100 Af values with 4 different 
penalty structures: 

• CH\ chain graph (with adaptive weights or not) 

• CL: clique graph (with adaptive weights or not) 

• 5 1 : star graph with group 1 as reference (with adaptive weights or not) 

• Sy. star graph with group 3 as reference (without adaptive weights) 

Note that the optimal graph would penalize only the null differences that appear in the simulated 
model. Thus none of these graphs is optimal for all the parameters. The optimal structure 
for parameter is a clique structure because its value is the same for all the groups. The 
most appropriate stmcture for parameters pc; fiy is the chain structure that penalizes all 
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Ns- 

20 subjects per group 

Ns- 

100 subjects per group 


CH 

CHa 

CL 

CLa 

CH 

CHa 

CL 

CLa 

Whole 

15% 

39% 

8 % 

32% 

25% 

59% 

28% 

55% 

V 

53% 

71% 

33% 

56% 

54% 

80% 

52% 

81% 

Cl 

41% 

86 % 

29% 

69% 

55% 

78% 

54% 

70% 

K 

61% 

68 % 

64% 

81% 

77% 

75% 

77% 

87% 


Table 2: Simulated data, 5 groups: Proportion of correctly selected model over 100 simulations for the whole fixed effects 
model and fixed effects model of V, Cl or ka. Different penalty stioictures: clique (CL), adaptive clique (CLa), chain 
{CH) and adaptive chain {CHa)- 


theoretically null differences (unlike star graph) and less non null differences than the clique 
graph. As previously suggested by ( |Viallon et al.[ 2014| l, adaptive weights should overcome the 
errors due to graph misspecification. Thus the penalized SAEM algorithm is implemented with 
the 4 different graphs with and without adaptive weights. 

The performance of each penalty structure is evaluated by comparing the selected model {Pfis) 
to the true model (fp) for each parameter, on each simulated data set (s = 1 ,..., 100 ) with the 
final estimate obtained by the fused lasso procedure and P a penalty matrix that encodes the 
differences under study. For example, when P - Pch, P corresponds to the (G - 1) x G matrix 
with Pi i = 1, Pi^i+i = -1 and 0 elsewhere. When considering the whole fixed effect model, the 
number of correctly selected model is: 


j 100 

^Pfiv,s=Pt^v ^ ^Pfia.s=Ppci ^ ^Ppka,. 


=PPka ■ 


Table 1^ shows the results with P = Pen for CH and CL as they are the only two penalties 
that could select exactly the true model. When Ng - 20, the chain graph has better performance. 
When Ng is large, performances of chain and clique graphs are very close. In addition, using 
adaptive weights clearly improves performance: in particular, the clique-based approach per¬ 
forms similarly to the chain-based one with adaptive weighs, even for Ng - 20. Thus clique 
graph is a good candidate when there is no prior information on data structure. This results tends 
to confirm the asymptotic optimality result of the clique-based strategy with adaptive weights 
that was obtained for generalized linear models ( |Viallon et al.||2014| ). 

Tablej^shows the results when P is a star graph (P - Ps^ for 5 1 , 5 14 and P - Ps, for S 3 ) that 
does not correspond to the real structure of the data: here only differences based on a star graph 
can be selected while theoretical parameters do not follow a star graph. When using a star graph 
penalty, the group of reference has a non negligible influence on the results. It is particularly true 
for fici- Indeed when group 1 is set as reference, theoretical values of and are 

distributed in an unbalanced way around juL (yuL, yU^y and ju^y are lower than //^y). The penalty 
unexpectedly tends first to fused yuj,y with yu^y, jujy and ju^y. The adaptive version of the penalties 
seems to mitigate this phenomenon when sample size is large (Ng = 100). This behavior is not 
observed when group 3 is set as reference, probably because theoretical parameters value of non 
reference groups are distributed in a more balanced way around yu^y. 
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Ns- 

20 subjects per group 

Ns- 

100 subjects per group 


Si 

S 1,A 

53 

Si 

S 1,A 

53 

Whole 

1% 

6% 

26% 

8% 

56% 

49% 

y 

69% 

72% 

57% 

100% 

100% 

90% 

Cl 

36% 

87% 

83% 

29% 

77% 

93% 

ka 

12% 

23% 

58% 

28% 

63% 

60% 


Table 3: Simulated data, 5 groups: Proportion of correctly selected model over 100 simulations when edges under study 
con'esponds to a star graph. Results are given for the whole fixed effects model, fixed effects of P, Cl or k„. Different 
penalty structures are considered: star with group 1 as reference (S i), star with group 3 as reference (S 3 ) and adaptive 
star with group 1 as reference (Si). 


4.3.2. Fixed and variance parameters selection 

Joint selection of fixed effects and random effects variances is evaluated through 50 simulated 
datasets using model Q with only two groups for computational time reasons. Individual pa¬ 
rameters are log-normally distributed. Error model parameters are set to a = 0.2 and b = 0.02. 
Fixed effects parameters are: 

= 0.48 pI = 0.58 
pI-i = 0.060 p^j = 0.042 
= 1 - 47 . 

Random effects variances are: 

12 92 

(Jiy - (l)y - 0.1 

= 0.1 0 J % = 0.21 

w£ = o.i w^;; = o.2i. 

For each data set, the best model is selected using BIC based on the penalized SAEM algorithm 
estimation. For comparison purpose, the selection approach based on a BIC forward stepwise 
method is also implemented. This stepwise method includes 2 steps: i) assuming the variances 
of random effects to be different between the groups, the hxed effect model is selected by BIC 
comparison, ii) using the selected hxed effects model, the variance model is selected by BIC 
comparison. The performance of the two methods is evaluated by comparing the selected model 
to the true model. Table presents the proportion of correctly selected models for the hxed 
effects model, the variances model and the whole model. Table presents the proportion on 
the 50 simulated data sets where a non null difference is detected for each parameters. On this 
synthetic example, our approach enjoys better selection performance than the stepwise approach. 
This is particularly true for variance parameters. However, Table shows that the fused lasso 
approach tends also to select more complex models especially for small sample sizes. Indeed 
Pk^ and {coyf' are theoretically equal in the 2 groups, but the fused lasso estimates a non null 
difference on these two parameters. When the fused lasso approach does not select the true 
model, it generally includes differences that are null in the true model. This is especially true 
when Ng -25. 
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Fixed effects 

Variances 


Both 


Ns 

25 

50 

100 

25 50 100 

25 

50 

100 

Stepwise Forward 

30% 

76% 

68% 

10% 30% 52% 

6% 

24% 

42% 

Fused LASSO 

40% 

74% 

76% 

38% 56% 78% 

14% 

40% 

60% 


Table 4: Simulated data, 2 groups: proportion of correctly selected models on 50 simulated datasets for the fixed ef¬ 
fects model, the variances model and the whole model. Results are given for the fused lasso and the stepwise forwai'd 
approaches. 



Ns 

= 25 

Ns 

= 50 

Ns 

II 

5 

! o 

j 


Fused 

Forward 

Fused 

Forward 

Fused 

Forward 

Ev 

16% 

56% 

94% 

88% 

100% 

94% 

Eci 

100% 

74% 

100% 

96% 

100% 

88% 

EK 

40% 

20% 

20% 

10% 

24% 

16% 

Wy 

20% 

14% 

14% 

6% 

12% 

20% 

"a 

64% 

32% 

88% 

68% 

96% 

88% 

^<7 

72% 

32% 

78% 

50% 

92% 

70% 


Table 5: Simulated data, 2 groups: proportion of the 50 simulated datasets with a non null difference detected for each 
parameter of the model, respectively. Results are given for the fused lasso and the stepwise forward approaches. 


5. Real data analysis 


We now illustrate our approach on a real data example. DE is an orally anticoagulant drug 
used for the prevention of venous thromboembolism after orthopedic surgery and stroke in pa¬ 
tients with atrial fibrillation. Its has a low bioavailability (fraction of administrated dose that 
reaches the systemic circulation) typically below 7%. It is mainly due to a solubility problem 
and to the P-glycoprotein (P-gp) efflux that has an ’’anti-absorption” function. P-gp inhibitors can 
increase Dabigatran bioavailability by improving its absorption ( Delavenne et al.| 2013| l. Then, 
the addition of P-gp inhibitors could also lead to overdosing and adverse event like hemorrhage. 

Data from two cross over clinical trials are considered. The two studies were conducted with 
two different dosing regimens for DE. The first trial, a two way crossover trial with 10 subjects, 
evaluated the interaction between DE (dosing regiment A) and P-Gp inhibithor 1 {Pgpli). The 
second trial, an incomplete three way crossover trial with 9 subjects, evaluated the interaction 



DEj+Pgplj 


DEj+Pgplj 


Figure 5: Graph used for the penalty of DE pooled data. 
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between DE (dosing regiment B), P-Gp inhibithor 2 {Pgph) and P-Gp inhibithor 3 (Pgpl^). 
Data from the two trials are pooled and five groups of subjects are defined; 

• DEa'. DE with dosing regimen A alone (10 subjects). 

• DEa + Pgph'- DE with dosing regimen A alone plus P-Gp inhibithor 1 (10 subjects). 

• DEb'. DE with dosing regimen B (DEb) alone (9 subjects). 


• DEb + Pgph'- DE with dosing regimen B alone plus P-Gp inhibithor 2 (9 subjects). 

• DEb + Pgph'- DE with dosing regimen B alone plus P-Gp inhibithor 3 (9 subjects). 

In each group, dabigatran blood concentration (pharmacokinetic) is measured for each patient 
at 10 sampling times after oral drug administration. The following pharmacokinetic model with 
one compartment and an inverse gaussian absorption is used; 


dAc 

dt 


= IG{t) - 



IG(t) — Dose X F X 


MAT 


2nCVh^ 


-(t-MAT)^ 
X e ICV'^MATt 


where A^ corresponds to the amount of dabigatran in the blood, and the absorption parameters F, 
MAT and CV correspond to bioavailability, mean absorption time and coefficient of variation of 
the absorption rate respectively. Einally parameters Cl and Vc are the elimination clearance and 
the volume of central compartment. Individual parameters are supposed log-normally distributed 
(h(4>) = log(0)). 

Estimating the bioavailability with only data from orally administrated drug is an ill-posed 
problem. Indeed, a decreased value for F could be balanced by smaller V and Cl values. In 
order to regularize this problem, we add prior distributions on both V and Cl fixed parameters 
(Weiss et al. |2012| l based on previously published results (Blech et al. 2008| l. In this case, fixed 
parameters update is done by solving the following optimization problem; 



yj 1 ^ 

■’Ek+i) = ArgMax ^Qk(p^,Q.l,ak,bk) - - pD’Vl ' (p« - pD - ApWPpWi 


with p^ ~ N(p^^, yf) as prior distribution. 

Due to the small number of subjects per group, only differences between groups for the 
bioavailability parameter F are analyzed. The penalized SAEM algorithm is applied to this 
model penalizing hxed effect and random effects variance of bioavailability (F). Parameters Vc, 
Cl, MAT and CV are supposed equal between the groups. High values for the adaptive weights 
were used for the corresponding differences to ensure they are null across groups. This as¬ 
sumption seems reasonable as; i) subjects are highly comparable due to very stringent inclusion 
criterions and ii) P-Gp inhibitors do not seem to influence MAT and CV ( |Delavenne et al.| 2013| l. 
The penalized SAEM algorithm is applied using the graph structure depicted in Eigure and a 
grid composed of 400 pairs of Af and Ay values. 

The optimal model selected by the BIC is shown in Eigure]^ Concerning fixed effects, the 
bioavailability is different between the two dosing regimens. It is certainly the consequence of 
the very low and pH-dependant solubility of DE. As the dosing regimen B was the lowest, then 
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Bioavailability Fixed Effects 


Bioavailability Variances 


3 . 75 % 0.623 



Figure 6: Model selected by the BIC and unpenalized reestimation of the bioavailability parameters from the real data. 
Groups with same color share equal estimates. 



Figure 7: Regularization path for both fixed and variance bioavailability parameters from the pooled DE data set. Red, 
bleu, green, purple and orange lines correspond to DEa, DEA + Pgph^ E)Eb, DEs+Pgph and DEg + Pgph respectively. 


the smaller the dose, the lower the DE solubility is. Among the three P-Gp inhibitors, only 
Pgph is associated to an increase in DE bioavailability. It is not surprising since Pgph is known 
to be a strong P-Gp inhibitor. Pgph and Pgph inhibit P-Gp much less in in-vitro experiment. 
Concerning random effects variances, a higher variance is estimated for dosing regimen B, which 
again is certainly attributable to solubility. Einally, Pigure|7]shows the regularization path of both 
fixed effects and variances. 


6. Discussion 

In this paper, we present a fused lasso penalized version of the SAEM algorithm. It allows 
the introduction of sparsity in the difference between group parameters for both fixed effects and 
variances of random effects. This algorithm is designed to iteratively maximize the penalized 
conditional expectation of the complete data likelihood. Simulation results show that this algo- 
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rithm has good empirical convergence properties. The theoretical study of this algorithm will be 
the scope of future work. 

Several extensions could be proposed. First, the hypothesis that the variance covariance matrix 
is diagonal might be too strong. For example, in pharmacokinetics the clearance Cl and the 
volume of distribution parameter may be strongly correlated. Neglecting this correlation could 
have important consequences on the model prediction properties. Moreover, the penalty used in 
this work does not allow for random effect selection. One way to tackle these two issues would 
be to directly penalize the variance-covariance matrix (instead of its inverse), which could be 


achieved by using the reparametrisation described by (Bondell et al. 20101. 


In this work, group sizes are supposed equal or not too different, which is often the case in 
pharmacokinetic. The algorithm could be easily modified by introducing the group size in the 
sum of the group conditional expectation (Danaher et al. 2013|l: 


Z 


NgQkin^ ,13^ ,Q.l,ak,bk). 


Concerning the selection of tuning parameters, other criterions than BIC have been used for 
generalized linear models. The cross-validated prediction error may be particularly useful espe¬ 
cially for high dimensional data since the unpenalized re-estimation of the log-likelihood can not 
always be done. For NLME, this criterion has already been studied by ( [Colby and Bair 2013| l 
and could be easily implemented. 

Finally a last improvement, subject of a future work, is the extension to NLMEMs including 
more than one level of random effects ( jPanhard and Samsonj |2009| l. Indeed in this paper the 
method is applied to data from a cross-over trial, where each subject receives the two treatment 
modalities. This information was neglected and the five groups were considered as independent 
which could lead to spurious association when inter occasion variability is high. 
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